On Compound Poisson Approximation For Sequence Matching

نویسنده

  • Marianne Månsson
چکیده

Consider the sequences fX i g m i=1 and fY i g n j=1 of independent random variables , which take values in a nite alphabet, and assume that the variables X 1 ; X 2 ; : : : and Y 1 ; Y 2 ; : : : follow the distributions and , respectively. Two variables X i and Y j are said to match if X i = Y j. Let the number of matching subsequences of length k between the two sequences, when r, 0 r < k, mismatches are allowed, be denoted by W. In this paper we use Stein's method to bound the total variation distance between the distribution of W and a suitably chosen compound Poisson distribution. To derive rates of convergence, the case where EW] stays bounded away from innnity, and the case where EW] ! 1 as m; n ! 1, have to be treated separately. Under the assumption that ln n= ln(mn) ! 2 (0; 1), we give conditions on the rate at which k ! 1, and on the distributions and , for which the variation distance tends to zero.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compound Poisson approximation: a user’s guide

Compound Poisson approximation is a useful tool in a variety of applications, including insurance mathematics, reliability theory, and molecular sequence analysis. In this paper, we review the ways in which Stein's method can currently be used to derive bounds on the error in such approximations. The theoretical basis for the construction of error bounds is systematically discussed, and a numbe...

متن کامل

On the bounds in Poisson approximation for independent geometric distributed random variables

‎The main purpose of this note is to establish some bounds in Poisson approximation for row-wise arrays of independent geometric distributed random variables using the operator method‎. ‎Some results related to random sums of independent geometric distributed random variables are also investigated.

متن کامل

A Compound Poisson Approximation Inequality

We give conditions under which the number of events which occur in a sequence of m-dependent events is stochastically smaller than a suitably defined compound Poisson random variable. The results are applied to counts of sequence pattern appearances and to system reliability. We also provide a numerical example.

متن کامل

On Runs in Independent Sequences

Given an i.i.d. sequence of n letters from a finite alphabet, we consider the length of the longest run of any letter. In the equiprobable case, results for this run turn out to be closely related to the well-known results for the longest run of a given letter. For coin-tossing, tail probabilities are compared for both kinds of runs via Poisson approximation.

متن کامل

Normal and Compound Poisson Approximations for Pattern Occurrences in NGS Reads

Next generation sequencing (NGS) technologies are now widely used in many biological studies. In NGS, sequence reads are randomly sampled from the genome sequence of interest. Most computational approaches for NGS data first map the reads to the genome and then analyze the data based on the mapped reads. Since many organisms have unknown genome sequences and many reads cannot be uniquely mapped...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Combinatorics, Probability & Computing

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2000